-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG: .describe() doesn't work for EAs #61707 #61760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: .describe() doesn't work for EAs #61707 #61760
Conversation
Signed-off-by: ianlv <[email protected]>
* DEPR: object inference in to_stata * Whatsnew * Fix broken test * alphabetize
…as-dev#61767) Revert "ENH: Allow third-party packages to register IO engines (pandas-dev#61642)" This reverts commit 9dcce63.
…61705) Co-authored-by: Simon Hawkins <[email protected]> Co-authored-by: jbrockmendel <[email protected]>
…) to 2.3 whatsnew notes (pandas-dev#61795) Co-authored-by: Simon Hawkins <[email protected]>
…1771) Co-authored-by: Joris Van den Bossche <[email protected]>
* CLN: remove and udpate for outdated _item_cache * CLN: remove outdated _item_cache in comment * CLN: rollback unittest unralted to _item_cache
* PERF: avoid object-dtype path in ArrowEA._explode * typo fixup
pandas-dev#61773) * BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with timestamp type * GH ref * BUG: ArrowEA constructor with timestamp type * mypy fixup * mypy fixup
…1785) * REF: remove unreachable, stronger typing in parsers.pyx * mypy fixup
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.12 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.12...v0.12.2) - [github.com/MarcoGorelli/cython-lint: v0.16.6 → v0.16.7](MarcoGorelli/cython-lint@v0.16.6...v0.16.7) - [github.com/pre-commit/mirrors-clang-format: v20.1.5 → v20.1.7](pre-commit/mirrors-clang-format@v20.1.5...v20.1.7) - [github.com/trim21/pre-commit-mirror-meson: v1.8.1 → v1.8.2](trim21/pre-commit-mirror-meson@v1.8.1...v1.8.2) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename method * ignore PLW0177 * Noqa test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthew Roeschke <[email protected]>
* Bump numpy * Bump numpy * Bump tzdata * ignore pytables usage, update xfail condition
…_csv (pandas-dev#61650) * feature pandas-dev#49580: support new-style float_format string in to_csv feat(to_csv): support new-style float_format strings using str.format Detect and process new-style format strings (e.g., "{:,.2f}") in the float_format parameter of to_csv. - Check if float_format is a string and matches new-style pattern - Convert it to a callable (e.g., lambda x: float_format.format(x)) - Ensure compatibility with NaN values and mixed data types - Improves formatting output for floats when exporting to CSV Example: df = pd.DataFrame([1234.56789, 9876.54321]) df.to_csv(float_format="{:,.2f}") # now outputs formatted values like 1,234.57 Co-authored-by: Pedro Santos <[email protected]> * update benchmark test * fixed pre commit * fixed offsets.pyx * fixed tests to windows * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * updated v3.0.0.rst and fixed tm.assert_produces_warning * fixed test_new_style_with_mixed_types_in_column added match to assert_produces_warning * Update doc/source/whatsnew/v3.0.0.rst (removed reference to this PR) Co-authored-by: Simon Hawkins <[email protected]> * fixed pre-commit * removed tm.assert_produces_warning * fixed space * fixed pre-commit --------- Co-authored-by: Pedro Santos <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]> Co-authored-by: Simon Hawkins <[email protected]>
…andas-dev#61727) * TST: update expecteds for using_string_dtype to fix xfails * Update to_dict_of_blocks test to hardcode object dtype * Comment * Split test, update expected, targeted xfails * Update json test * revert commented-out
* DOC: Update link to pytz documentation * Update the pytz link per the suggestion
…ment (pandas-dev#61827) * DOC: Correct error message in AbstractMethodError for methodtype argument * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fix(doc): rm excessive backtick
…e' and 'Docs' properly (pandas-dev#61836) * DOC: Update README.md to proper link to issues related to Docs * DOC: Update README.md to proper link to issues related to 'good first issue'
…row/fastparquet engine keyword) (pandas-dev#61877)
…ar error (pandas-dev#61855) Co-authored-by: Khemkaran <[email protected]>
def test_describe_multiple_dtypes(self): | ||
""" | ||
GH61707: describe() doesn't work on EAs which generate | ||
statistics with multiple dtypes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick can this be a comment instead of a docstring
@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]: | |||
return names | |||
|
|||
|
|||
def has_multiple_internal_dtypes(d: list[Any]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this can be inlined since it is only used once
@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series: | |||
import pyarrow as pa | |||
|
|||
dtype = ArrowDtype(pa.float64()) | |||
elif has_multiple_internal_dtypes(d): | |||
# GH61707: describe() doesn't work on EAs | |||
# with multiple internal dtypes, so return object dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the relevant characteristic "multiple internal dtypes" or "entries that cant be cast to Float64"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latter makes more sense
This PR fixes a bug where Series.describe() fails on certain
ExtensionArray
dtypes such aspint[kg]
, due to attempting to cast the result toFloat64Dtype
. This is because some of the produced statistics are not castable to float, which raises errors like DimensionalityError.We now avoid forcing a Float64Dtype return dtype when the EA’s scalar values cannot be safely cast. Instead:
If the EA produces outputs with mixed dtypes, the result is returned with
dtype=None
.